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Particularly in genomics, but also in other fields, it has become commonplace to undertake 
highly multiple Student's t-tests based on relatively small sample sizes. The literature on this 
topic is continually expanding, but the main approaches used to control the family-wise error 
rate and false discovery rate are still based on the assumption that the tests are independent. 
The independence condition is known to be false at the level of the joint distributions of the test 
statistics, but that does not necessarily mean, for the small significance levels involved in highly 
multiple hypothesis testing, that the assumption leads to major errors. In this paper, we give 
conditions under which the assumption of independence is valid. Specifically, we derive a strong 
approximation that closely links the level exceedences of a dependent "studentized process" to 
those of a process of independent random variables. Via this connection, it can be seen that 
in high-dimensional, low sample-size cases, provided the sample size diverges faster than the 
logarithm of the number of tests, the assumption of independent t-tests is often justified. 

Keywords: false discovery rate; family-wise error rate; genomic data; large deviation 
probability; moving average; Poisson approximation; Student's t-statistic; upper tail 
dependence; upper tail independence 

1. Introduction 

Today it is commonplace to undertake highly multiple hypothesis testing, generally in 
genomics and very often using tests based on Student's t-statistic; see, for example, 
Benjamini and Yekutieli (2001), Efron and Tibshirani (2002), Cui and Churchill (2003), 
Amaratunga and Cabrera (2004), page 114, Scheid and Spang (2005), Shaffer (2005), Fox 
and Dimmic (2006), Hu and Willsky (2006), Qiu and Yakovlcv (2006), Efron (2007a), 
Liu and Hwang (2007) and van de Wiel and Kim (2007). This popularity of multiple 
f-testing also extends to other fields (e.g., Pawluk-Kolc et al. (2006)). The principal 
methods used to control the family-wise error rate and false discovery rate are founded 
on the assumption of independence among tests. Alternative approaches are generally 
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based either on Bonferroni bounds, which are unsatisfactory for a variety of reasons (see, 
e.g., Perneger (1998)), or on the hope that, despite ample evidence of non-independence 
in terms of correlation analysis, independence can be assumed in practice. 

The latter hope tends to be pinned either on work of Bcnjamini and Yekutieli (2001), 
who argued that in some settings, the absence of independence can give conservative 
results, or on experience with the analysis of financial data, which suggests that in some 
circumstances, it might be reasonable to assume that the upper tails of the test statistics 
are independent, even if the joint distributions are not. Upper tail independence, as it is 
sometimes called (for discussion, see, e.g., Wu (1994), Falk and Reiss (2001), R. Schmidt 
(2002), Li (2006), R. Schmidt and Stadtmiiller (2006), T. Schmidt (2007)), is generally 
assumed to be non-asymptotic in nature. That is, tails of joint distributions are often 
taken to be perfectly independent beyond a certain threshold. 

However, this type of model is not really appropriate for the analysis of genomic data. 
In particular, it is difficult to determine a biological reason for, or the actual location 
of, a threshold. It is of greater practical interest to consider the possibility that the 
strength of dependence in upper tails could become successively weaker as the number of 
simultaneous tests, and the number of data vectors, increases. If this could be established 
in the context of tests based on Student's ^-statistic, it would lend immediate justification 
to the often-made assumption (see the articles cited in the first paragraph of this paper) 
that highly multiple i-statistics can be taken to be independent. 

The present paper will establish such a result. The mechanism for our model involves 
the critical points for tests becoming more extreme as the number, p, of tests diverges 
(in fact, the increase in critical points is a direct consequence of p diverging) so that the 
tests are conducted further into the tails; furthermore, the tails of the distributions of 
test statistics becoming successively lighter as the number of degrees of freedom of the 
test statistics increases. 

We impose particularly weak conditions on the marginal distributions of components. 
In particular, the distributions need only three finite moments. With this assumption, and 
permitting the size, n, of the group sample to increase a little faster than the logarithm 
of the number of tests, it follows from our results that the joint distributions of test 
statistics enjoy an asymptotic form of the upper tail independence property. 

This result would not be so striking if the statistics had normal distributions, but it 
fails for heavy-tailed distributions such as those for which not all moments are finite. 
Of course, Student's i-distribution is itself in this category, yet our results show that 
asymptotic independence holds in a particularly strong sense for Student's t-statistic, 
even if it is computed from relatively heavy-tailed data. The reason this is possible is 
that we permit the group sample size to increase at a rate that is just sufficient to convert 
heavy tails to tails that are sufficiently light, to enable approximate independence at high 
levels. 

It can be seen from this property that the availability of upper-tail asymptotic inde- 
pendence is a bonus of working with highly multiple hypothesis testing, that is, with 
"large p and small n" problems. It is not available in more conventional, "small p and 
large n" problems, where there is a very large literature on modelling dependence in 
highly multiple hypothesis testing. 
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There is a literature on comparing studentized means when the variances used for 
studentizing are computed from pooled data and so are common to each test statistic. 
However, in our experience, that approach is used less frequently, in practice, than the 
"local" standardization treated in the present paper. When using the latter method, each 
mean is divided by the standard deviation of the sample from which it was computed. 
A major motivation is that the true variances may be different in each instance. Even 
if the variances can reasonably be assumed to be the same, it can be desirable to use 
the local approach since it confers greater robustness. For example, when applied to the 
mean alone, rather than its locally studentized form, the large-deviation properties that 
underpin the analysis of high-level exceedences require the data to have lighter tails. 

Statistical literature on highly multiple hypothesis testing is outlined in helpful re- 
views by Hochbcrg and Tamhane (1987), Pigcot (2000), Dudoit et al. (2003), Bcrnhard 
et al. (2004) and Lehmann and Romano (2005), Chapter 9. Benjamini and Hochberg 
(1995) introduced an approach, which has become very popular, to the controlling of 
false discovery rates; see also Simes (1986), Hommel (1988), Hochberg (1988), Sarkar and 
Chang (1997), Sarkar (1998), Sen (1999), Hochberg and Benjamini (1990) and Lehmann 
et al. (2005). Benjamini and Yekutieli (2001) specified conditions under which simul- 
taneous, dependent hypothesis tests, conducted as though they were independent, give 
conservative results; Benjamini and Yekutieli (2005) addressed similar issues in the con- 
text of false coverage-statement rate. Sarkar (2002) extended the work of Benjamini and 
Yekutieli (2001). Efron (2007b) suggested correlation corrections for large-scale simulta- 
neous hypothesis testing. Blair et al. (1996) proposed methods for controlling family-wise 
error rates in multiple procedures, Holland and Cheung (2002) discussed robustness of 
family-wise error rates and Clarke and Hall (2009) discussed robustness of testing pro- 
cedures based on means. 

2. Results and applications 
2.1. Model and main results 

Given p,n> 1 , assume that for 1 < i < p and 1 < j < n, we observe data Uij , which we use 
to construct i-statistics T{ = n^Ui/Si, where Ui = n _1 53 . Uij and Sf = n _1 53-, Ufj — 
Uf . In practice, the statistic Ti is used to test the hypothesis that the ith group has zero 
mean, against a one-sided alternative. When controlling the level of family-wise error 
rate (FWER) for step-down tests, we require the values of probabilities P(Tj > t for i = 
ii, . . . , ik) for different levels t and different subsets {ii, . . . , i^} of {1, . . . Theorem 1 
below will enable us to compute these through approximation by the case where the TVs 
are all independent; sec Section 2.3 for further details. 

We standardize Sf by dividing by n, rather than n — 1, since the former is more 
common in nonparametric problems, but the results below are unaffected by this issue. 
Since we studentize, there is no loss of generality in assuming that the variance of each 
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component equals 1. More particularly, we ask that 

0<E(U il ) = d i , var(Z7a) = l for alii, supE(\U a \ 3 ) < oo, (2.1) 

»>i 

where d±,d2,... is a sequence of constants. The assumption that d. L > is made here 
because, in the great majority of practical applications, the hypothesis alternative to the 
null entails the zero level being exceeded. Accordingly, the tests are one-sided, hence our 
preoccupation with exceedences of a level. However, minor modifications of our arguments 
permit the two-sided case to be treated. 

Further, we assume that for an integer k > 0, 

the random vectors (/7h, U2i, ■ ■ ■), for i > 1, are independent and identically 
distributed, the sequence of random variables Un, U21, ■ ■ ■ is K-dependent (2.2) 
and uiBXi 1 i 2 - Pi ± i 2 < 1 ; 

where Pi 1 i 2 = corr(J7i 1 i, £/j 2 i). The third moment condition in (2.1) permits the variables 
Uij to have relatively heavy-tailed distributions, for example, a Pareto distribution with 
tail exponent greater than 3. 

The assumption of short-range correlation in (2.2) is, of course, an oversimplification, 
but it reflects the low level of correlation that is often observed in practice. For example, 
Messer and Arndt (2006) argue that correlation decays from about 0.08, at a separation 
of approximately two base pairs, to about 0.01 for a separation of ten base pairs. Results 
reported by Mansilla et at (2004) corroborate these figures if we assume that their data 
are normally distributed. More generally, Almirantis and Provata (1999) give evidence 
of both short-range and long-range correlation, depending on the nature of the DNA or 
RNA under investigation. 

The relationship between the group size, n, and the number of hypothesis tests, p, is 
assumed to satisfy 

logp = o(n). (2.3) 

This allows the group size to be very much smaller than the number of tests. In the 
absence of more detailed assumptions about the distributions of the t/y's, (2.3) is nec- 
essary for the theorem we shall give below. To appreciate why, note that if the f/y's 
are independent and identically distributed with an atom at zero and, in particular, if 
6 = P(Uij = 0) > 0, then, with probability at least <5", the i-statistic T\ assumes the in- 
determinate value 0/0. In such cases, we shall take T\ = 1, but in order for the theorem 
to have a meaningful interpretation when t is the (1 — p _1 )-lcvcl quantile of the standard 
normal distribution, it is essential that the probability that T\ = 0/0 be of smaller order 
than p -1 . Therefore, we require S n = o(p^ 1 ) for all < 5 < 1 and this assumption is 
equivalent to (2.3). 
Define 



a =7 . mi .n . (i-Piiiz) 

4 11,12: 21^22 



(2.4) 
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and 7 = a + 1. Condition (2.2) implies that a > and, of course, a < \. Given rj > 0, let 
t = t(p) satisfy 



(1 + y?) v/27- 1 logp < t = OCVIogp), 



l<i<p 



max (i,; = o{t/^/n), 



(2.5) 



where di,d,2,--- are as in (2.1). If £ satisfies the first part of (2.5), then any function cf> 
which satisfies, as t — > 00, 



converges to zero as p — > 00. In the arguments in Section 3, we shall use this notation 
generically; while <j> will satisfy (2.6), it will alter from one appearance to another. Strictly 
speaking, it is not essential to take p to diverge. Although that condition motivates 
the assumption of divergent t and is, in turn, motivated by the contemporary high- 
dimensional problems that led to this work, it is not necessary for the theorem below. 

Theorem 1. If {2.2) -{2.6) hold, then there exists a probability space on which are defined 
random variables T 1 ncw , . . . , Tp low and T[, . . . ,T' V such that (i) the joint distribution of 
T" ew , . . . , Tp 6W is identical to that of T\, . . . ,T p ; (ii) the random variables T[, . . . are 
independent and distributed, respectively, asT\,. . . ,T p ; and (hi) with probability equal to 
1 — 4>{t), the exceedences of t by T± ew , . . . ,Tp EW occur at the same indices and take the 
same values as the exceedences oft by T[, . . . ,T' p . 

To interpret the theorem, note that we would normally expect the dependent data 
set Ti,...,T p to exhibit clusters of level exceedences, rather than the single, isolated 
exceedences associated with the independent sequence T[,...,T' V . The fact that the Xi 
process (or, equivalently, the Tf™ process) behaves like the T[ process in the case of 
large exceedences reflects the fact that, since the marginal distribution of a i-statistic is 
relatively light-tailed (if n is sufficiently large - see (2.3)), exceedences of a high level are 
rare and so are unlikely to occur together. The case of low-level exceedences is a very 
different matter, of course, and so we would expect the theorem to fail if the lower bound 
for t, in the first part of (2.5), were relaxed too far. 

2.2. Applications 

In this section, we treat the case of the null hypothesis, where di = for each i. This 
would be assumed in most applications of Theorem 1 since it represents the setting that 
is conventionally used for calibration. 

The theorem implies that, in a strong sense, exceedences down to those of the level 
(1 + ^)(27 _1 logp) 1 / 2 are identical to the ones that would occur in the case of independent 
tests. Now, the probability associated with an exceedence of (1 + ?y)(27~ 1 logp) 1 / 2 is, 
for small rj, approximately p -1 / 7 . Therefore, false discoveries at probability levels of 
approximately p -1 ^ 7 , and at lower levels, can be adequately controlled by assuming that 



0(t) = exp{o(i 2 )}{exp(-ii 2 ) + pcxp(- 7 t 2 /2)} 



(2.6) 
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the tests are independent, even when they are not. Note that 7 1 < 1 and that the 
false-discovery level controlled by the conventional family-wise error rate is only p~ x . 

Next, we discuss the sorts of calculations that are enabled by Theorem 1. Let Qj denote 
the number of indices i € [l,p] for which Tj lies in the interval (tj,tj-i], where j>l and tj 
is determined by P(Tj >tj)=j0/p, with (3 > held fixed. (We take t = 00.) If Ti , . . . , T p 
were fixed, then the joint distribution of the k + 1 random variables Qi, . . . , — Qj 
would be exactly multinomial with parameters p and q%, . . . , g%, 1 — X)j 1j s where qj = 
P{T\ € (tj,tj-i]). Theorem 1 implies that for the dependent process Ti, . . . ,T p , and for 
any ko = ko(p) for which tk satisfies (2.5), any simultaneous probability calculation 
based on the multinomial result, but applied to the actual Tj process rather than an 
idealized process with independent marginals, is valid, provided that k < ko and the final 
computed probability is quantified by adding an error which is stated to be of order 
exp{o(t^)}{cxp(— t^/A) +pexp(— 7^/2)}. The latter probability converges to zero, even 
if k = ko is taken as large as p^ 1-1 )/ - * -1 , where 71 € (1,7). 

From this point, simultaneous multinomial probability calculations based on Qx, . . . , Qk, 
familiar from the well-understood case of independent test statistics, can be used to 
construct rules for controlling FWER or false discovery rate (FDR); see, for example, 
Benjamini and Hochberg (1995). Wang and Hall (2009) have shown that, under the as- 
sumption of finite third moments, highly accurate approximations are available for the 
marginal distribution of T\. Such calculations, which justify standard normal, Student's 
t- or bootstrap approximations to the marginal distribution of Ti, are already widely 
used in practice (see Section 1), in conjunction with the independence assumption, when 
controlling false discovery rates. Our paper provides justification for these methods. 

More generally, Theorem 1 implies that if a probability statement about what the pro- 
cess Ti, . . . , Tp does above the level t is founded on the assumption of independence, then, 
no matter how complex or convoluted the statement might be, the claimed probability 
level is accurate to within 4>{t). 

To give an example of calculations based on Theorem 1, take p <po = 10 6 , n = 100 and 
t = 5.052, the latter denoting the upper (1 — p " 1 )-level quantile of Student's i-distribution 
with n — 1 = 99 degrees of freedom. Reflecting empirical evidence given in Section 2.1, take 
7 = 4(1 — 0.1) + 1 = 1.225. Then, (2.5) is in order; the probability that at least one value of 
p independent t statistics, each on 99 degrees of freedom, exceeds t = 5.052 equals 0.010, 
0.095 and 0.63 for p = 10 4 , 10 5 and 10 6 , respectively; and (2.6) suggests that the errors in 
these levels are in error by less than 30%, 20% and 0.25%, respectively. Most likely, the 
errors are much less than these since the asymptotic bound is derived only as an upper 
bound. If we were to make a general probability statement about exceedences of the level 
5.052 by the stochastic process of t statistics, under the assumption of independence, 
then, despite the process actually being K-dependent rather than independent, we would 
expect to make errors no greater than these respective values. In the same general setting, 
relative error decreases to zero as p and t increase. For example, in cases where t solves 
1 — <l>(t) xp _1 , with $ denoting the standard normal distribution function, we have 
{exp{-±t 2 )+pexp(-±jt 2 )}/(l-p)P = 0[exp(- it 2 ) + icxp{-i(7-l)i 2 }] -> as t -> 00. 
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2.3. Generalizations 

Theorem 1 can be extended to other settings, in particular, to those where (a) a wider 
range of dependence, obtained by allowing k in (2.2) to diverge with p, is allowed; (b) the 
value of n for the ith group equals Ui, depending on i, and (2.3) is altered by requiring 
that log p = o(minj< p rij); (c) weights Wij are incorporated into the construction of the 
^-statistics T. h by defining Ui = n^ 1 J2j WijUij, Sf = n^ 1 J2j w %^i] ~ Uf and, as before, 
l li ~ 

Ti = n i Ui/Si. Provided the weights satisfy 

sup|t%-|<Ci, inf n i " 1 #{j : \ Wij \ > C 2 } > C 3 , 

where C\ , Ci , C3 are positive constants not depending on p, the proof in this more general 
case is as in Section 3. However, the statement of the theorem is then less elegant and 
less transparent, so we do not give the more general version here. Incorporation of the 
weights Wij permits the scope of the example above to be extended to hypothesis-testing 
problems involving linear regression. 

To indicate the types of results that can be achieved under longer ranges of dependence, 
we shall discuss the case of a moving average, 

K 

k=l 

where k = k(p) is permitted to diverge to infinity at a rate not exceeding logp and the 
independent disturbances Eji are all distributed as e, for which E{e) = and E\e\ 3 < 00. 
In this setting, (2.2) holds. We strengthen (2.3) by asking that \ogp = 0(n 1 / 3 ). The 
definition of t implicit in (2.5) can now be refined to 

t = ^2j- 1 ( log p + A log log p) , 

where A > denotes a sufficiently large absolute constant. The conclusions of Theorem 1 
continue to hold, with a similar proof if we replace 1 — <f>(t) by 1 — o(l). 

3. Proof of Theorem 1 
3.1. Step 1: Preliminaries 

The notation Di, D2, . ■ ■ will denote constants not depending on n or p. Let Qi = 
n- 1 J2j( U ij ~ d i) 2 : Ri = n 1/2 Ui/Ql /2 and note that 

for each t > 0, the events Ti>t and Ri > t/(l + n^t 1 ) 1 ! 2 are identical. (3.1) 

Also, note that Ri = (£\ + ndi)/(J2j Vfj) 1 ^ ''1 where V i3 = Uij — di, 1 <j < n, are 
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independent and identically distributed random variables satisfying 

0<E(V tl )=0, E(Vl) = l foralU, sup£(|^i| 3 ) < oo; (3.2a) 

i>i 

cf. (2.1). 

3.2. Step 2: Probabilities of exceedences in ones and twos 

Using results of Wang and Hall (2009) (see also Wang (2005)), it can be shown that, for 
constants Di,D 2 ,D 3 > 0, and whenever < s < Din 1 / 2 , 

P{Ri > s) < L» 2 s _1 cxp(£> 3 s 3 ?i- 1/2 - \s 2 + V^dis). (3.2) 

We also wish to prove the following related result for pairs of exceedences. 

Lemma. Assume the conditions of Theorem 1. There then exist Di,D^ > such that 
for all ii,i2 with i\ ^i<z, and for all < s < D^n 1 / 2 , we have 

sup P(i?; 1 > s,R h > s) 

l<\ii —ii I <fc 2 — fei 

(3.3) 

< 5exp{-i(l + a)s 2 + D 5 n- 1/2 s 3 + 2y/n{d h + d i2 )s}, 
where a is as in (2.4)- 

To establish the lemma, we write 

U$ = V tll IWii\ < ™ 1/2 /s, \V i2 i\ < n^/s), U$ =V hl - U$, 
U$ = V l2l I(\V nl \ < n^/s, \V i2i \ < n^/s), C/g = V i%x U™. 
By virtue of (3.2a), simple calculations show that 
\E(U$+u£l)\<D 6 s 2 /n, 
E{{u\Hf + (U^) 2 } = 2 + 0(1)*/^, 

mSl + O 2 = ^{(^1 + ^1) - vgl + u Shf 

< 2(1 + Pij ) + D 6 s/Vn, 
E\U$+U%\ 3 <D e . 
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These results, and the bound c x < 1 + | e K (valid for all real x), imply that, 

with h = s/\/n, 

E 



exp ji^i + V i2l ) - \h 2 {V 2 x + V^IdV^l < n^/s, \V l2l \ < n^/s) 



(3.4) 



E 



expUhiV^ + V^-^h^V^ + V^Xm^l^n^/s, or \V i2l \ > n^/s) 



< exp{P(|y il i| > n l ' 2 /s) + P(\V i2l \ > n l ' 2 /s)} 
<D & {s/y/nf. 



(3.5) 



Results (3.4) and (3.5), together with the independence of Vi lk for each i\, imply that, 
for s < Dgy^, with Dg sufhciently small, 



E 



<U + (/>«-l)£+Ao 1 



k 

2 1 n 



An n 3 / 2 



s 2 s 3 
<exp<j ( Pij -l)—+D n -j= 



(3.6) 



< exp< 



x E 



Define e 2 = (1 — pij)/8. It follows from (3.6) that whenever s < Dgy/n, with Dg suffi- 
ciently small, we have 

wi n = p\2hJ2(Vi lk + V i2k ) - h 2 ^^ + V 2 k ) 

^ k k 

+ 2^/n(d il +d J2 )s>2s 2 (l-e 2 ) 

)(2V^K + d l2 )s - i.s 2 (l - e 2 )| (3.7) 
exJh^iV^k + V i2k ) - \h^(Vnk + V 2 k )\ 

^ k k ' - 

< exp|-i(l + a)s 2 + D 12 -^= + 2^{d il +4Jsj, 
where a is as defined in (2.4). Write 0„ = (1 — e, 1 + e) and note that if < e < |, then 
{E ^fe +«* a > s(nQ? 2 ) 1/2 ,Q l2 G An J 
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C [ 2h V i2 k - h 2 V£ k + 2V^d i2 s > s 2 ( 1 - e 2 ) ] , 



k k 

where h = si \fn. It can be shown that 



P(R h > s,Ri 2 > s) 

v i, i, J 



(3.8) 



fc k 
< TTln + 7T2n + 7T 3 „ + TT in + 7T 5 „ , 



where 



^n = P^£,V ilk + nd il ^sinQl) 1 ' 2 ^ >l + e|, 
7r 3n = V hk + nd ^ > s{nQi) 1/2 ,Q>2 > 1 + e}, 

7T 5 „ = P|X1 ^ + " rf *2 > s(«Q? 2 ) V2 'Q^ < 1- e|- 

Property (3.3) will follow from (3.7) and (3.8) if we prove that there exists Di 3 > such 
that, for s < D^n 1 / 2 , 

TT kn < cxp|-i(l + a)s 2 + D u -^= + 2y/n{d il +d l2 )s| (3.9) 

for fc = 2,3,4,5. 

Our proof of (3.9) is based on arguments of Shao (1999) (see also the proof of Propo- 
sition 4.2 of Wang and Hall (2009)) and uses the following result: if EX = 0, EX 2 = 1 
and E\X\ 3 < oo, then for any A > 0, 9 > and x > 0, 

E[exp{XbX - 6(bX) 2 }} = 1 + (A 2 - 0)rT V + A(X, d) n - 3 / 2 x 3 E\X\ 3 , (3.10) 

where b — x/yfn and A(X,9) depends only on A and 6. This result is a special case of 
Lemma 1 of Shao (1999). Also, note that 

K2n < 4n + V ^ + nd >i ^ S ( n QD 1/2 > Qii > 3 } 

< 7r ( 1 ) +7r (2) +7r (3) 
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where, noting that y^d^l < s/5, we define 



^ ft ^ k 

4n = p\Y.V llk I{\V llk \ < n^ 2 /s) > 3 S V^/2} 



1/2 

ft 



If the random variable H has the Bi(n,p) distribution and if a > 0, then P(iJ > an) < 

(ep/a) an and so 

TS?<i , {E J (i^*i> Bl/a /«)^- a } 

< { S - 2 12nP(|^ lfc | > n l / 2 /s)f < ie" s2 

for s < Dny/n, with .D14 sufficiently small. Arguments similar to those in the proof of 
(3.7) yield that < ^e~ s for s < Du^/n with .D14 sufficiently small. To estimate ir^ ■> 
we write Si = {(a;, y): a; > s^/y, s 2 (l + e) 2 < y < 9s 9 }. It follows from (3.10) with A = 1, 
9 = g and X = V^i that, with /i = s/y/n, 



< = P[ (hJ2 V ^ + vW, h 2 £ V 2 k j G 5! 

ex P^E^ lfc ~ \ h2 E ^ fc + exp j- ^ inf ^ (x - y/6) j 

cxpj Q - s 2 - s 2 (l + e) + i s 2 (l + e) 2 + vW + D 15 s 3 n- 1/2 J 



< E 



< 

< exp|-is 2 - (5es 2 /8) + y/ndiS + (D 15 s 3 /y/^) 

< exp|-i(l + a)s 2 + sftidiS + (D 15 s 3 /Vn~)^ , 

where we have used the fact that the function f(y) = s^/y — ^y is increasing in s 2 (l + e) 2 < 
y < 9s 2 . Combining all of the above estimates, we obtain 

7r 2 „ < exp{-|(l + a)s 2 + y/ndiS + (£>i 5 s 3 / \/n)}. 

Similarly, we may prove (3.9) for k = 3. 
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Put S 2 = {(x,y): x>Syfy,y< (l-e) 2 s 2 }. It follows from (3.10) with A = l, = 2 and 
X = Vi t i that, with h = s/y/n, 

n 4n = P| (hJ2 Vhk + V^diS, h 2 Vik^j G $2 1 

<£ cxpf/iV V ilk -2h 2i y ^ 2 fe + VndiS ]exp{- inf (s-2y)) 
< exp{-1.5s 2 - s 2 (l - e) + 2s 2 (l - e) 2 + ^/Tid q s + (D 16 s 3 / y/n~)} 



< exp j-is 2 - 2es 2 + y/n~d t s + (£>i 6 s 3 /Vn) j 

< exp|-i(l +a)s 2 + v/ndjS + (£> 16 s 3 /V™) j- 



Similarly, we may prove (3.9) for = 5. This completes the derivation of (3.9) and, hence, 
also the proof of the lemma. 

3.3. Step 3: Blocks and expected numbers of level exceedences 

Partition the set of positive integers into small blocks, each of length k + 1, where k is 
as in (2.2), and large blocks, each of length £, where I is a divergent function of p. Wc 
shall take 

£~exp(i s 2 ), (3.11) 

where s — > oo as p increases. The integers in each block are consecutive, each consecutive 
pair of large blocks is separated by a small block and the block furthest to the left is a 
large block. Let the small blocks be b\ , b 2 , • ■ • and the large blocks be B\ , B 2 , . . . , indexed 

such that the order of the blocks is B%, b\,B2, 62, Let B = B\ = {1, . . . ,£} denote the 

first large block and let N\ be the number of indices i G B for which B4 > s. We wish 
to develop a bound for E{N±I(Ni > 2)}. Identical bounds can be derived, uniformly in 
the block indices, for the versions of JVi in the case of blocks B2, B3, . . . ; for notational 
simplicity, wc focus solely on B\. 
By Holder's inequality, 

E{N!l(Ni > 2)} < {EN^ 1 ) 1/ai P{N 1 > 2) 1/a2 , (3.12) 

where ox, 0,2 > 1 satisfy a^ 1 + a^ 1 = 1. Define e?° = ^/nmaxKKjj di. In view of (3.2) and 
(3.3), 

P(Ni > 2) = P(for some ii,i 2 G B with ii < i 2 , R tl ,R l2 > s) 
e-i e 

<Y, Yl P(Rn>s,R l2 >s) 
ii=X i 2 =ii + l 
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(-1 min(ii+K+l,f) 

= J2 E m 1 >s,i? i2 > s ) (3.i3) 

»i=l <a=»i+l 

ii=l i 2 = min(ii+/t+2,£) 

< Diyexppigs 3 ?!" 1 / 2 +L> 19 d°s) 

? exp|-i(a + l)s 2 | +^ 2 exp(-s 2 ) 

Noting that N± can be written as n + 1 sums of ^/(k + 1) independent and identically 
distributed random variables and using calculations based on the binomial distribution, 
it can be shown that, for the choice of £ at (3.11), E(N^ ) is bounded as p — s- oo for each 
a\ > 0. Hence, using (3.12) and (3.13), we deduce that for each 772 & (0, 1), 

E{NiI(N! > 2)} < D 20 cxp{D 21 s 3 n- 1 / 2 + D 22 d°s) 

(3.14) 

x [£exp{-i(a + l)s 2 }+^ 2 cxp(-s 2 )] 1- " 2 . 

Write 7V2 for the number of exceedences of s that occur in the union of the small 
blocks bj that intersect the interval [l,p\. There are 0(p/£) such small blocks and each 
is of length k + 1, so, by (3.2), 

E(N 2 ) < D 2zV £- 1 P{R 1 > s) < D^pi" 1 exp(L» 3 s 3 n- 1 / 2 - ±s 2 + d°s). (3.15) 

Provided we choose s = s(p) to diverge to infinity in such a manner that 



.s = 0(Vlogp), d° = o(s), (3.16) 
it follows from (2.3) that s^n^ 1 / 2 + d°s = o(s 2 ) and so (3.14) entails that 

E{NiI(Ni > 2)} = exp{o(s 2 )}[^cxp{-i(a + l)s 2 } + i 2 exp(-s 2 )] 1-1 ' 2 . 
Since this is true for each r\ 2 > 0, we have 

EiNJiNi > 2)} = exp{o(s 2 )}[£cxp{-±(a + l)s 2 } + ^ 2 cxp(-s 2 )]. (3.17) 

3.4. Step 4: Bound for P(iVi > 1), and related bounds 

Let 7V3 denote the number of exceedences of s which come from large blocks Bj, 1 < 
j < 771, that have two or more exceedences. Write ^ . 7Tj for the sum over 1 < j < 777 
of the probability iXj that Ri > s for some i £ Bj . Then (a) the expected number of 
exceedences of s by . . . , i? p equals ^2 i<p P{Ri > s) and is less than or equal to Y]j iij + 
E{N 2 ) + E(N 3 ); (b) the expected number of exceedences in (a) is greater than or equal 
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to Ejxm-i Tj; and (c) since P(iVi > 1) < £(JVi) = £P(iii > s) and P(i?i > s) satisfies 
(3.2), we have 



7Ti = P(iVi > 1) < P° = J D 2 s- 1 ^cxp(D 3 s 3 n- 1/2 - is 2 + d°s) 



(3.18) 



and an identical bound holds for m, . . . , 7r m , in particular, (d) ir m < P . Results (a)-(d) 
imply that 



i=l i=l 



<P(iV 2 )+P(iV3) + J P . 



(3.19) 



Since £(AT 3 ) < mE{N 1 I{N 1 > 2)}, m = 0(p/^) and bounds for E{NiI(N! > 2)}, E{N 2 ) 
and P(N\ > 1) are given by (3.17), (3.15) and (3.18), it follows that (3.19) entails, on 
taking £ as in (3.11), 



j=l f=l 



1 



cxp<^ --s / +o(s^) Wl+pexp ~-s - -as 



1 2 1 s 



(3.20) 



3.5. Step 5: Probabilities of level exceedences 

Let T denote the event that (a) there are no exceedences of s in any of the small 
blocks that are wholly contained within [l,p\; (b) in each of the large blocks that is 
wholly contained within [l,p], there is at most one exceedence of s; and (c) there are no 
exceedences of s in any block fragment that overlaps the end point p. Write Q for the 
complement of J- . Results (3.15), (3.17) and (3.18) imply that, with I given by (3.11) 
and assuming that (3.16) holds, 

P(G) < cxp{-i S 2 +o(s 2 )}{1 +pexp(-is 2 - ias 2 )}. (3.21) 

Therefore, in order for P(Q) — > 0, it is sufficient that for some 773 € (0, 1) and all sufficiently 
large p, we have 

(1 + m ) v/2 7 -i logp < a = O(Vtoip), (3.22) 

where 7 is as defined in Section 2. This choice of s satisfies (3.16) and so if s is given by 
(3.22), then P(G) satisfies (3.21). 



3.6. Step 6: Strong approximation 

Let Mj, 1 < j < m, be the number of times that Ri > s for i £ Bj. Then, the number, N, 
say, of blocks Bj for which Mj > 1 is distributed as ■ Ij , where the random variables Ij 
are independent, Ij = 1 if Mj > 1 and Ij = otherwise. As before, we define ttj = PiMj > 
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1). Conditional on N and on the events u Mj 1 > 1" and "Mj 2 > 1," where 1 < j\ < j2 < to, 
the sequences {i?^: i £ -B^} and {i?,;: i € -Bj 2 } are independent. 

Order the blocks Bj for which Mj > 1 , giving Bj 1 , . . . , Bj N , where 1 < J\ < ■ ■ ■ < 
Jjv < to, and let Wk denote a value of Ri for which i?i > s, randomly chosen among such 
values for which i £ Bj k . Write i = Ik for the index of the value of B4 that is chosen 
as Wk- Then, conditional on N, the random variables Wi, . . . , Wn are independent and 
identically distributed as R(s). Ji, . . ., J/v is a set of integers chosen independently and 
randomly from 1, . . . ,m and Ik is uniformly distributed among indices in Bj k . 

Let R[ , , . . , R' p be independent random variables having the distributions of R\ , . . . , 
R p , respectively, let Mj denote the number of times that R^ exceeds s for i £ Bj and 
put tt'j = P(M^ > 1). The numbers N' of blocks Bj for which Mj > 1 are distributed 
as . 7j , where the random variables ij are independent and ij = 1 if Mj > 1 , /j = 
otherwise. An argument similar to, but simpler than, that leading to (3.20) shows that 



5> 



7rj| <exp<{ -I s 2 +o(s 2 ) j jl+pcxpj-is 2 - ^as 2 ) 



By enlarging the probability space if necessary, we can think of N as denoting the num- 
ber out of m independent and random variables U\, . . . , Uj , each uniformly distributed 
on [0, 1], which lie in the respective intervals [0,7Tj]. Take N' to be the number of Ui's 
that lie in [0,7$. Then, 

P(N = N') >i-g l^.-^l. (3.24) 

We have already constructed sequences Wi , . . . , Wjv , /i , ■ • ■ , In and J\ , ■ ■ ■ , Jn ■ If 
N' > N, then, conditional on these quantities and on N and N', we select new 
values Wn+i, ■ ■ ■ , Wn> , In+i,---,In' and Jn+i, ■ ■ ■ , Jn' which arc independent of 
Wi, . . . , Wn, h,- ■ - ,In and Ji, . . . , Jat, with Wiv+i, . . . , Wjv' independently distributed 
as i?(s), the values of Jjv+i, • • • , Jn' independently and uniformly distributed among 
{ 1 , . . . , m} \ { Ji , . . . , Jn} and the values of Jjv+i , • ■ • , In 1 uniformly distributed within 
the blocks Bj n+1 ,...,Bj n ,, respectively. In this instance, we take W[,...,W' N , and 
I[,...,I' N , to be identical to Wi,...,Wn' and I\,...,In', respectively. If N' < N, 
then we take (W{,J[),...,(W N ,,J N ,) to be the (exceedence, block index) pairs that 
remain after randomly and independently deleting N — N' pairs from the sequence 
{W 1 ,J 1 ),...,{W N ,J N ). 

Let No denote the number of exceedences of s by R[ , . . . , R' p and let N' represent the 
number of large blocks Bj in which there is at least one exceedence of s by the sequence 
R[, . . . , R' p . Then, P(N > N') = 1. Conditional on N and N', let W N>+1 , W' No de- 
note independent and identically random variables, all distributed as R(s), and dis- 
tribute the locations I' N , +1 , . . . , I' Nq of these exceedences independently and uniformly 
over the points {1, ...,p}\ {/{,..., I' N ,}, conditional on all of the variables N', N , 
W[ , . . . , W' N , and J[ , . . . , J' N , . Take the values of R' l ,...,R' p that exceed s to be the vari- 
ables W[ , . . . , W Nq and let the locations of those exceedences be the points I' Nf) ■ By 
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construction, W[ , . . . , W' N are distributed as the exceedences of s by p independent and 
identically distributed random variables distributed as R(s) ; conjointly, /(,..., I' N are 
distributed as the locations of those exceedences and the probability that N tt = N' = N, 
Mj E {0,1} for each j 6 [l,m] and there are no exceedences of s in any of the small 
blocks bj for any j € [l,m] is bounded below by 1 — r(s), where t(s) satisfies (2.6); see 
also (3.20), (3.21), (3.23) and (3.24). 

Hence, provided that s satisfies (3.22), we may construct a sequence i?' 1; ...,i?p of 
independent variables with the same marginal distribution as R\ and such that, with 
probability bounded below by 1 — t(s), the exceedences of R%, . . . , R p over s arc identical 
to those of R[, . . . , R' p . The theorem follows from this property, (2.3) and (3.1), on taking 
s = t. 
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